Final presentation¶

This is the final submission related to the phenomenom of the existence of the existence of Earth's lightning hotspots. As this is the last submission, it will mostly cover the algorithms used during the semester that was currectly giving back information. All plots are present, all code that used is here. Let's load in all the needed packages!

Goal #1: getting the hotspots of the continents¶

At this point the simplified ouput is created, meaning a faster workflow with it. Only steps that are need to be done are:

  1. Create our first figure with a 2D histogram from the yearly data
  2. Create a stable method tha can find peaks in it (may need smoothening)
  3. Detect the hotspots on each continent

This was the point of the key paper, let's see how much close we can get to it!

Simplified output¶

Its source is the .h5 avaliable on kooplex. This is not needed to be recreated as I will hand this out. The problem is that the .h5 are a bit messy to handle, has insane amount of data and a dozen of parameters recorded in themselves. Thats why the simplified output was created during the semester: we only need a fraction of the data. Later on we will discover another issue, hidden from plain sight...

In [36]:
#let's load in the simplified data!


col_names = ["time","long","lat"]
simplified_data = pd.read_csv("../data/simplified/simplified.csv", skiprows=4, delimiter=" ", names=col_names)
print(simplified_data.info(), simplified_data.values.shape)


xedges_0 = np.arange(-180,181)
yedges_0 = np.arange(-54,55)

#these are arbetrarily choosen 
xedges_1 = np.linspace(-180,180,int(12742/6/2)+1)
yedges_1 = np.linspace(-54,54,int(12742/6/2 * (54/180))+1)

#generate the hist2d
HIST2D, x_edges, y_edges = np.histogram2d(simplified_data["long"].values,simplified_data["lat"].values,
                                         bins=(xedges_1,yedges_1))
print(HIST2D.shape) 
HIST2D = HIST2D.T   #you can transform it, as it is mostly used in this way for imshow, pcolormesh
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 798991 entries, 0 to 798990
Data columns (total 3 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   time    798991 non-null  float64
 1   long    798991 non-null  float64
 2   lat     798991 non-null  float64
dtypes: float64(3)
memory usage: 18.3 MB
None (798991, 3)
(1061, 318)
In [61]:
plot_my_first_figure()

Top 10 most thunderous spots for each continent¶

There will be a lot of masks, with the shapefiles for the well somewhat well defined surfaces. What is being used here is the following:

  1. The shapefile to define the surfaces
  2. A gaussian filter to determine hotspots
In [63]:
plot_my_second_figure()

Conclusions¶

We can compare this to the keypaper's results and it seems like it somewhat consistent. The keypaper states that the yearly analysis could be done in order to track the climate change. Also, if someone is good in topography, you could see that the hotspots locations are mostly under on climate on each continent. The most hotspots are located at:

  1. in the Kongo Basin in Africa
  2. The west foothills of Alps in Europe
  3. The west coastline of Australia
  4. THe west foothills of Himalaja in Asia

Goal #2: comparing other data with LIS ISS' data¶

In order to get the a deeper understanding and give an explanation, we have to compare these results to other datasets to draw conclusion about what may be behind the phenomenom. Personally I like the idea of wind pushing humid are to dry land that is meeting and obstructive surface (mountains). This means that the windspeed is generally slower in these areas. The problem is that most wind data is either a forecast, or should not be used for scientific models (or behind paywall). In this case I have choosen a dataset that is for observation only but should be good in this case!

Problems with wind¶

The problem using wind measurements is that we instictively know that even during the day, the windspeed and direction changes, but the lightning hotspots are observed through a longer times interval. In order to choose the best day (at least) to describe, we need do to a seasonal or daily analysis. Let's get onto this!

In [65]:
plot_my_third_figure()
In [67]:
plot_my_fourth_figure()

${\bf \text{Whoops, something is not okay!}}$

The problem is that the data that is avaliable on kooplex is till 10/06. And NASA's datawarehouse for the LIS ISS data goes from version 1.0 to 2.0, the older data (so the missing part of 2020) is hardly accessable.

${\bf \text{But this data covers significant part of the year!}}$

Getting the wind data¶

Used wind data source are here:

https://data.remss.com/ccmp/v02.1.NRT/

https://images.remss.com/figures/measurements/ccmp/Mears_2019_CCMP_NRT_JGR.pdf

The problem with this is this uses the netCDF4 file format.

In [49]:
#preps
path_w = "E:/_ELTE_PHYS_MSC/3_third_semester/datascience/data/wind_data/my_days/"
file1 = "CCMP_RT_Wind_Analysis_20200120_V02.1_L3.0_RSS.nc"
file2 = "CCMP_RT_Wind_Analysis_20200316_V02.1_L3.0_RSS.nc"
file3 = "CCMP_RT_Wind_Analysis_20200613_V02.1_L3.0_RSS.nc"
file4 = "CCMP_RT_Wind_Analysis_20200718_V02.1_L3.0_RSS.nc"
file5 = "CCMP_RT_Wind_Analysis_20200728_V02.1_L3.0_RSS.nc"
file6 = "CCMP_RT_Wind_Analysis_20200916_V02.1_L3.0_RSS.nc"

#load in one
au_wind = netCDF4.Dataset(path_w + file1, "r", format="NETCDF4") 
sa_wind = netCDF4.Dataset(path_w + file2, "r", format="NETCDF4") 
eu_wind = netCDF4.Dataset(path_w + file3, "r", format="NETCDF4") 
na_wind = netCDF4.Dataset(path_w + file4, "r", format="NETCDF4") 
as_wind = netCDF4.Dataset(path_w + file5, "r", format="NETCDF4") 
af_wind = netCDF4.Dataset(path_w + file6, "r", format="NETCDF4") 

net_cdfs = []
net_cdfs.append(au_wind)
net_cdfs.append(sa_wind)
net_cdfs.append(eu_wind)
net_cdfs.append(na_wind)
net_cdfs.append(as_wind)
net_cdfs.append(af_wind)
In [69]:
plot_my_stream_figure()
<ipython-input-68-6dff51d65a9e>:46: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations. 
  fig.tight_layout()
In [72]:
plot_my_sixth_figure()
In [73]:
plot_my_seventh_figure()
In [74]:
plot_my_eight_figure()

SOURCES¶

[1] ${\bf \text{Where Are the Lightning Hotspots on Earth?}}$, Rachel I. Albrecht1, Steven J. Goodman2, Dennis E. Buechler3, Richard J. Blakeslee4, and Hugh J. Christian. 01 Nov. 2016.

[2] ${\bf \text{Remote Sensing Systems Cross-Calibrated Multi-Platform (CCMP) 6-hourly ocean vector wind analysis}}$ product on 0.25 deg grid, Version 2.0, Wentz, F.J., J. Scott, R. Hoffman, M. Leidner, R. Atlas, J. Ardizzone, 2015: Remote Sensing Systems, Santa Rosa, CA.

Links:

LIS ISS data online: https://ghrc.nsstc.nasa.gov/lightning/data/data_lis_iss.html

WIND DATASET: https://data.remss.com/ccmp/v02.1.NRT/

Github repository for the work: https://github.com/AdamGTaylor/DataScience_Lab_2021

Great 3D map for Wind dataset: https://www.nnvl.noaa.gov/weatherview/index.html